Integrating Verb-Particle Constructions into CCG Parsing
نویسندگان
چکیده
Despite their prevalence in the English language, multiword expressions like verb-particle constructions (VPCs) are often poorly handled by NLP systems. This problem is partly due to inadequacies in existing corpora; the primary corpus for CCG-oriented work, CCGbank, does not account for VPCs at all, and is inconsistent in its handling of them. In this paper, we apply some corrective transformations to CCGbank, and then use it to retrain an augmented version of the Clark and Curran CCG parser. Using our technique, we observe no significant change in F-score, while the resulting parse is semantically more sound.
منابع مشابه
Hindi CCGbank: CCG Treebank from the Hindi Dependency Treebank
In this paper, we present an approach for automatically creating a Combinatory Categorial Grammar (CCG) treebank from a dependency treebank for the Subject-Object-Verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. A determinis...
متن کاملThe Headedness of Mandarin Chinese Serial Verb Constructions: A Corpus-Based Study
Existing treebanks of Mandarin Chinese such as the Sinica Treebank, the Harbin Institute of Technology Treebank, and the Penn Chinese Treebank, parse Chinese serial verb constructions incorrectly or inconsistently in terms of headedness, i.e. which verb to be assigned with the label of syntactic and/or semantic “head”. Aspectual markers in serial verb constructions can help determine the head o...
متن کاملDiscontinuous Verb Phrases in Parsing and Machine Translation of English and German
In this paper, we focus on the verb-particle (V-Prt) split construction in English and German and its difficulty for parsing and Machine Translation (MT). For German, we use an existing test suite of V-Prt split constructions, while for English, we build a new and comparable test suite from raw data. These two data sets are then used to perform an analysis of errors in dependency parsing, word-...
متن کاملIntegrating support verb constructions into a parser
This paper describes the process of integrating into a rule-based parser a set of approximately 1,000 nominal predicates forming support verb constructions (SVC) with the verb dar ‘give’ in Brazilian Portuguese. The system was evaluated on a sample of 580 sentences containing verb-noun combinations candidates to SVC, manually and independently annotated. Best results yield 85% precision, 79% re...
متن کاملIncremental Derivations in CCG
This paper presents a research note on the degree to which strictly incremental derivations (that is derivations which are fully connected at each point in time) are possible in Combinatory Categorial Grammar (CCG). There has been a recent surge of interest in incremental parsing both from the psycholinguistic community in a bid to build psycholinguistically plausible models of language compreh...
متن کامل